Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 265
Filtrar
1.
medRxiv ; 2024 Mar 11.
Artigo em Inglês | MEDLINE | ID: mdl-38559087

RESUMO

Background: Recent studies have challenged assumptions about slow correction of severe hyponatremia and have shown that rapid correction is associated with shorter hospital length of stay. However, the confounding effect of admission diagnosis has not been fully explored. The objective of this study was to determine whether rapid correction is still associated with shorter length of stay when controlling for admission diagnosis. Methods: This retrospective cohort study is based on the Medical Information Mart for Intensive Care, including data from both MIMIC-III (2001-2012) and MIMIC-IV (2008-2019). Patients were identified who presented to the hospital with initial sodium <120 mEq/L and were categorized according to total sodium correction achieved in the first day (<6 mEq/L; 6-10 mEq/L; >10 mEq/L). Linear regression was used to assess for an association between correction rate and hospital length of stay, and to determine if this association was significant when controlling for admission diagnosis classifications based on diagnosis related groups (DRGs). Results: There were 636 patients included in this study. Median [IQR] hospital length of stay was 7 [4, 11] days. Patients had a median [IQR] initial sodium value of 117 [114, 118] mEq/L and final sodium value of 124 [119, 128] mEq/L. In a univariate linear regression, the highest rate of correction (>10 mEq/L) was associated with a shorter length of stay than a moderate rate of correction (coef. -2.363, 95% CI [-4.710, -0.017], p=0.048), but the association was not significant when controlling for admission diagnosis group (coef. -1.685, 95% CI [-3.836, 0.467], p=0.125). Conclusions: Faster sodium correction was not associated with shorter length of stay when controlling for admission diagnosis categories, suggesting that the disease state confounds this association. While some patients may be discharged earlier if sodium is corrected more rapidly, others may not benefit or may be harmed by this strategy.

2.
Artigo em Inglês | MEDLINE | ID: mdl-38578616

RESUMO

OBJECTIVE: To investigate the consistency and reliability of medication recommendations provided by ChatGPT for common dermatological conditions, highlighting the potential for ChatGPT to offer second opinions in patient treatment while also delineating possible limitations. MATERIALS AND METHODS: In this mixed-methods study, we used survey questions in April 2023 for drug recommendations generated by ChatGPT with data from secondary databases, that is, Taiwan's National Health Insurance Research Database and an US medical center database, and validated by dermatologists. The methodology included preprocessing queries, executing them multiple times, and evaluating ChatGPT responses against the databases and dermatologists. The ChatGPT-generated responses were analyzed statistically in a disease-drug matrix, considering disease-medication associations (Q-value) and expert evaluation. RESULTS: ChatGPT achieved a high 98.87% dermatologist approval rate for common dermatological medication recommendations. We evaluated its drug suggestions using the Q-value, showing that human expert validation agreement surpassed Q-value cutoff-based agreement. Varying cutoff values for disease-medication associations, a cutoff of 3 achieved 95.14% accurate prescriptions, 5 yielded 85.42%, and 10 resulted in 72.92%. While ChatGPT offered accurate drug advice, it occasionally included incorrect ATC codes, leading to issues like incorrect drug use and type, nonexistent codes, repeated errors, and incomplete medication codes. CONCLUSION: ChatGPT provides medication recommendations as a second opinion in dermatology treatment, but its reliability and comprehensiveness need refinement for greater accuracy. In the future, integrating a medical domain-specific knowledge base for training and ongoing optimization will enhance the precision of ChatGPT's results.

3.
medRxiv ; 2024 Mar 22.
Artigo em Inglês | MEDLINE | ID: mdl-38562711

RESUMO

Background: Health research that significantly impacts global clinical practice and policy is often published in high-impact factor (IF) medical journals. These outlets play a pivotal role in the worldwide dissemination of novel medical knowledge. However, researchers identifying as women and those affiliated with institutions in low- and middle-income countries (LMIC) have been largely underrepresented in high-IF journals across multiple fields of medicine. To evaluate disparities in gender and geographical representation among authors who have published in any of five top general medical journals, we conducted scientometric analyses using a large-scale dataset extracted from the New England Journal of Medicine (NEJM), Journal of the American Medical Association (JAMA), The British Medical Journal (BMJ), The Lancet, and Nature Medicine. Methods: Author metadata from all articles published in the selected journals between 2007 and 2022 were collected using the DimensionsAI platform. The Genderize.io API was then utilized to infer each author's likely gender based on their extracted first name. The World Bank country classification was used to map countries associated with researcher affiliations to the LMIC or the high-income country (HIC) category. We characterized the overall gender and country income category representation across the medical journals. In addition, we computed article-level diversity metrics and contrasted their distributions across the journals. Findings: We studied 151,536 authors across 49,764 articles published in five top medical journals, over a long period spanning 15 years. On average, approximately one-third (33.1%) of the authors of a given paper were inferred to be women; this result was consistent across the journals we studied. Further, 86.6% of the teams were exclusively composed of HIC authors; in contrast, only 3.9% were exclusively composed of LMIC authors. The probability of serving as the first or last author was significantly higher if the author was inferred to be a man (18.1% vs 16.8%, P < .01) or was affiliated with an institution in a HIC (16.9% vs 15.5%, P < .01). Our primary finding reveals that having a diverse team promotes further diversity, within the same dimension (i.e., gender or geography) and across dimensions. Notably, papers with at least one woman among the authors were more likely to also involve at least two LMIC authors (11.7% versus 10.4% in baseline, P < .001; based on inferred gender); conversely, papers with at least one LMIC author were more likely to also involve at least two women (49.4% versus 37.6%, P < .001; based on inferred gender). Conclusion: We provide a scientometric framework to assess authorship diversity. Our research suggests that the inclusiveness of high-impact medical journals is limited in terms of both gender and geography. We advocate for medical journals to adopt policies and practices that promote greater diversity and collaborative research. In addition, our findings offer a first step towards understanding the composition of teams conducting medical research globally and an opportunity for individual authors to reflect on their own collaborative research practices and possibilities to cultivate more diverse partnerships in their work.

5.
Diagn Progn Res ; 8(1): 6, 2024 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-38561864

RESUMO

Acute pancreatitis (AP) is an acute inflammatory disorder that is common, costly, and is increasing in incidence worldwide with over 300,000 hospitalizations occurring yearly in the United States alone. As its course and outcomes vary widely, a critical knowledge gap in the field has been a lack of accurate prognostic tools to forecast AP patients' outcomes. Despite several published studies in the last three decades, the predictive performance of published prognostic models has been found to be suboptimal. Recently, non-regression machine learning models (ML) have garnered intense interest in medicine for their potential for better predictive performance. Each year, an increasing number of AP models are being published. However, their methodologic quality relating to transparent reporting and risk of bias in study design has never been systematically appraised. Therefore, through collaboration between a group of clinicians and data scientists with appropriate content expertise, we will perform a systematic review of papers published between January 2021 and December 2023 containing artificial intelligence prognostic models in AP. To systematically assess these studies, the authors will leverage the CHARMS checklist, PROBAST tool for risk of bias assessment, and the most current version of the TRIPOD-AI. (Research Registry ( http://www.reviewregistry1727 .).

6.
PLOS Digit Health ; 3(4): e0000474, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38620047

RESUMO

Despite significant technical advances in machine learning (ML) over the past several years, the tangible impact of this technology in healthcare has been limited. This is due not only to the particular complexities of healthcare, but also due to structural issues in the machine learning for healthcare (MLHC) community which broadly reward technical novelty over tangible, equitable impact. We structure our work as a healthcare-focused echo of the 2012 paper "Machine Learning that Matters", which highlighted such structural issues in the ML community at large, and offered a series of clearly defined "Impact Challenges" to which the field should orient itself. Drawing on the expertise of a diverse and international group of authors, we engage in a narrative review and examine issues in the research background environment, training processes, evaluation metrics, and deployment protocols which act to limit the real-world applicability of MLHC. Broadly, we seek to distinguish between machine learning ON healthcare data and machine learning FOR healthcare-the former of which sees healthcare as merely a source of interesting technical challenges, and the latter of which regards ML as a tool in service of meeting tangible clinical needs. We offer specific recommendations for a series of stakeholders in the field, from ML researchers and clinicians, to the institutions in which they work, and the governments which regulate their data access.

8.
BMJ Health Care Inform ; 31(1)2024 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-38642921

RESUMO

OBJECTIVES: To explore the views of intensive care professionals in high-income countries (HICs) and lower-to-middle-income countries (LMICs) regarding the use and implementation of artificial intelligence (AI) technologies in intensive care units (ICUs). METHODS: Individual semi-structured qualitative interviews were conducted between December 2021 and August 2022 with 59 intensive care professionals from 24 countries. Transcripts were analysed using conventional content analysis. RESULTS: Participants had generally positive views about the potential use of AI in ICUs but also reported some well-known concerns about the use of AI in clinical practice and important technical and non-technical barriers to the implementation of AI. Important differences existed between ICUs regarding their current readiness to implement AI. However, these differences were not primarily between HICs and LMICs, but between a small number of ICUs in large tertiary hospitals in HICs, which were reported to have the necessary digital infrastructure for AI, and nearly all other ICUs in both HICs and LMICs, which were reported to neither have the technical capability to capture the necessary data or use AI, nor the staff with the right knowledge and skills to use the technology. CONCLUSION: Pouring massive amounts of resources into developing AI without first building the necessary digital infrastructure foundation needed for AI is unethical. Real-world implementation and routine use of AI in the vast majority of ICUs in both HICs and LMICs included in our study is unlikely to occur any time soon. ICUs should not be using AI until certain preconditions are met.


Assuntos
Inteligência Artificial , Cuidados Críticos , Humanos , Unidades de Terapia Intensiva , Conhecimento , Pesquisa Qualitativa
9.
J Biomed Inform ; 153: 104643, 2024 Apr 14.
Artigo em Inglês | MEDLINE | ID: mdl-38621640

RESUMO

OBJECTIVE: Health inequities can be influenced by demographic factors such as race and ethnicity, proficiency in English, and biological sex. Disparities may manifest as differential likelihood of testing which correlates directly with the likelihood of an intervention to address an abnormal finding. Our retrospective observational study evaluated the presence of variation in glucose measurements in the Intensive Care Unit (ICU). METHODS: Using the MIMIC-IV database (2008-2019), a single-center, academic referral hospital in Boston (USA), we identified adult patients meeting sepsis-3 criteria. Exclusion criteria were diabetic ketoacidosis, ICU length of stay under 1 day, and unknown race or ethnicity. We performed a logistic regression analysis to assess differential likelihoods of glucose measurements on day 1. A negative binomial regression was fitted to assess the frequency of subsequent glucose readings. Analyses were adjusted for relevant clinical confounders, and performed across three disparity proxy axes: race and ethnicity, sex, and English proficiency. RESULTS: We studied 24,927 patients, of which 19.5% represented racial and ethnic minority groups, 42.4% were female, and 9.8% had limited English proficiency. No significant differences were found for glucose measurement on day 1 in the ICU. This pattern was consistent irrespective of the axis of analysis, i.e. race and ethnicity, sex, or English proficiency. Conversely, subsequent measurement frequency revealed potential disparities. Specifically, males (incidence rate ratio (IRR) 1.06, 95% confidence interval (CI) 1.01 - 1.21), patients who identify themselves as Hispanic (IRR 1.11, 95% CI 1.01 - 1.21), or Black (IRR 1.06, 95% CI 1.01 - 1.12), and patients being English proficient (IRR 1.08, 95% CI 1.01 - 1.15) had higher chances of subsequent glucose readings. CONCLUSION: We found disparities in ICU glucose measurements among patients with sepsis, albeit the magnitude was small. Variation in disease monitoring is a source of data bias that may lead to spurious correlations when modeling health data.

11.
EBioMedicine ; 102: 105047, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38471396

RESUMO

BACKGROUND: It has been shown that AI models can learn race on medical images, leading to algorithmic bias. Our aim in this study was to enhance the fairness of medical image models by eliminating bias related to race, age, and sex. We hypothesise models may be learning demographics via shortcut learning and combat this using image augmentation. METHODS: This study included 44,953 patients who identified as Asian, Black, or White (mean age, 60.68 years ±18.21; 23,499 women) for a total of 194,359 chest X-rays (CXRs) from MIMIC-CXR database. The included CheXpert images comprised 45,095 patients (mean age 63.10 years ±18.14; 20,437 women) for a total of 134,300 CXRs were used for external validation. We also collected 1195 3D brain magnetic resonance imaging (MRI) data from the ADNI database, which included 273 participants with an average age of 76.97 years ±14.22, and 142 females. DL models were trained on either non-augmented or augmented images and assessed using disparity metrics. The features learned by the models were analysed using task transfer experiments and model visualisation techniques. FINDINGS: In the detection of radiological findings, training a model using augmented CXR images was shown to reduce disparities in error rate among racial groups (-5.45%), age groups (-13.94%), and sex (-22.22%). For AD detection, the model trained with augmented MRI images was shown 53.11% and 31.01% reduction of disparities in error rate among age and sex groups, respectively. Image augmentation led to a reduction in the model's ability to identify demographic attributes and resulted in the model trained for clinical purposes incorporating fewer demographic features. INTERPRETATION: The model trained using the augmented images was less likely to be influenced by demographic information in detecting image labels. These results demonstrate that the proposed augmentation scheme could enhance the fairness of interpretations by DL models when dealing with data from patients with different demographic backgrounds. FUNDING: National Science and Technology Council (Taiwan), National Institutes of Health.


Assuntos
Benchmarking , Aprendizagem , Estados Unidos , Humanos , Feminino , Idoso , Pessoa de Meia-Idade , População Negra , Encéfalo , Demografia
12.
J Thromb Haemost ; 2024 Mar 28.
Artigo em Inglês | MEDLINE | ID: mdl-38554934

RESUMO

BACKGROUND: Interventional therapies (ITs) are an emerging treatment modality for pulmonary embolism (PE); however, the degree of racial, sex-based, and sociodemographic disparities in access and timing is unknown. OBJECTIVES: To investigate barriers to access and timing of ITs for PE across the United States. METHODS: A retrospective cohort study utilizing the Nationwide Inpatient Sample from 2016-2020 included adult patients with PE. The use of ITs (mechanical thrombectomy and catheter-directed thrombolysis) was identified via International Classification of Diseases 10th revision codes. Early IT was defined as procedure performed within the first 2 days after admission. RESULTS: A total of 27 805 273 records from the 2016-2020 Nationwide Inpatient Sample database were examined. There were 387 514 (1.4%) patients with PE, with 14 249 (3.6%) of them having undergone IT procedures (11 115 catheter-directed thrombolysis, 2314 thrombectomy, and 780 both procedures). After multivariate adjustment, factors associated with less use of IT included Black race (odds ratio [OR], 0.90; 95% CI, 0.86-0.94; P < .01), Hispanic race (OR, 0.73; 95% CI, 0.68-0.79; P < .01), female sex (OR, 0.88; 95% CI, 0.85-0.91; P < .01), treatment in a rural hospital (OR, 0.49; 95% CI, 0.44-0.54; P < .01), and lack of private insurance (Medicare OR, 0.77; 95% CI, 0.73-0.80; P < .01; Medicaid OR, 0.65; 95% CI, 0.61-0.69; P < .01; no coverage OR, 0.87; 95% CI, 0.82-0.93; P < .01). Among the patients who received IT, 11 315 (79%) procedures were conducted within 2 days of admission and 2934 (21%) were delayed. Factors associated with delayed procedures included Black race (OR, 1.12; 95% CI, 1.01-1.26; P = .04), Hispanic race (OR, 1.52; 95% CI, 1.28-1.80; P < .01), weekend admission (OR, 1.37; 95% CI, 1.25-1.51; P < .01), Medicare coverage (OR, 1.24; 95% CI, 1.10-1.40; P < .01), and Medicaid coverage (OR, 1.29; 95% CI, 1.12-1.49; P < .01). CONCLUSION: Significant racial, sex-based, and geographic barriers exist in overall access to IT for PE in the United States.

13.
14.
Semin Ophthalmol ; 39(3): 193-200, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38334303

RESUMO

BACKGROUND: Imaging plays a pivotal role in eye assessment. With the introduction of advanced machine learning and artificial intelligence (AI), the focus has shifted to imaging datasets in ophthalmology. While disparities and health inequalities hidden within data are well-documented, the ophthalmology field faces specific challenges to the creation and maintenance of datasets. Optical Coherence Tomography (OCT) is useful for the diagnosis and monitoring of retinal pathologies, making it valuable for AI applications. This review aims to identify and compare the landscape of publicly available optical coherence tomography databases for AI applications. METHODS: We conducted a literature review on OCT and AI articles with publicly accessible datasets, using PubMed, Scopus, and Web of Science databases. The review retrieved 183 articles, and after full-text analysis, 50 articles were included. From the included articles were identified 8 publicly available OCT datasets, focusing on patient demographics and clinical details for thorough assessment and comparison. RESULTS: The resulting datasets encompass 154,313 images collected from Spectralis, Cirrus HD, Topcon 3D, and Bioptigen devices. These datasets included normal exams, age-related macular degeneration, and diabetic maculopathy, among others. Comprehensive demographic information is available in one dataset and the USA is the most represented population. DISCUSSION: Current publicly available OCT databases for AI applications exhibit limitations, stemming from their non-representative nature and the lack of comprehensive demographic information. Limited datasets hamper research and equitable AI development. To promote equitable AI algorithmic development in ophthalmology, there is a need for the creation and dissemination of more representative datasets.


Assuntos
Inteligência Artificial , Oftalmologia , Humanos , Oftalmologia/métodos , Tomografia de Coerência Óptica/métodos , Algoritmos , Retina/patologia
15.
medRxiv ; 2024 Jan 23.
Artigo em Inglês | MEDLINE | ID: mdl-38343827

RESUMO

Introduction: The Brazilian Multilabel Ophthalmological Dataset (BRSET) addresses the scarcity of publicly available ophthalmological datasets in Latin America. BRSET comprises 16,266 color fundus retinal photos from 8,524 Brazilian patients, aiming to enhance data representativeness, serving as a research and teaching tool. It contains sociodemographic information, enabling investigations into differential model performance across demographic groups. Methods: Data from three São Paulo outpatient centers yielded demographic and medical information from electronic records, including nationality, age, sex, clinical history, insulin use, and duration of diabetes diagnosis. A retinal specialist labeled images for anatomical features (optic disc, blood vessels, macula), quality control (focus, illumination, image field, artifacts), and pathologies (e.g., diabetic retinopathy). Diabetic retinopathy was graded using International Clinic Diabetic Retinopathy and Scottish Diabetic Retinopathy Grading. Validation used Dino V2 Base for feature extraction, with 70% training and 30% testing subsets. Support Vector Machines (SVM) and Logistic Regression (LR) were employed with weighted training. Performance metrics included area under the receiver operating curve (AUC) and Macro F1-score. Results: BRSET comprises 65.1% Canon CR2 and 34.9% Nikon NF5050 images. 61.8% of the patients are female, and the average age is 57.6 years. Diabetic retinopathy affected 15.8% of patients, across a spectrum of disease severity. Anatomically, 20.2% showed abnormal optic discs, 4.9% abnormal blood vessels, and 28.8% abnormal macula. Models were trained on BRSET in three prediction tasks: "diabetes diagnosis"; "sex classification"; and "diabetic retinopathy diagnosis". Discussion: BRSET is the first multilabel ophthalmological dataset in Brazil and Latin America. It provides an opportunity for investigating model biases by evaluating performance across demographic groups. The model performance of three prediction tasks demonstrates the value of the dataset for external validation and for teaching medical computer vision to learners in Latin America using locally relevant data sources.

16.
Circulation ; 149(14): e1028-e1050, 2024 04 02.
Artigo em Inglês | MEDLINE | ID: mdl-38415358

RESUMO

A major focus of academia, industry, and global governmental agencies is to develop and apply artificial intelligence and other advanced analytical tools to transform health care delivery. The American Heart Association supports the creation of tools and services that would further the science and practice of precision medicine by enabling more precise approaches to cardiovascular and stroke research, prevention, and care of individuals and populations. Nevertheless, several challenges exist, and few artificial intelligence tools have been shown to improve cardiovascular and stroke care sufficiently to be widely adopted. This scientific statement outlines the current state of the art on the use of artificial intelligence algorithms and data science in the diagnosis, classification, and treatment of cardiovascular disease. It also sets out to advance this mission, focusing on how digital tools and, in particular, artificial intelligence may provide clinical and mechanistic insights, address bias in clinical studies, and facilitate education and implementation science to improve cardiovascular and stroke outcomes. Last, a key objective of this scientific statement is to further the field by identifying best practices, gaps, and challenges for interested stakeholders.


Assuntos
Doenças Cardiovasculares , Cardiopatias , Acidente Vascular Cerebral , Estados Unidos , Humanos , Inteligência Artificial , American Heart Association , Doenças Cardiovasculares/terapia , Doenças Cardiovasculares/prevenção & controle , Acidente Vascular Cerebral/diagnóstico , Acidente Vascular Cerebral/prevenção & controle
17.
PLOS Glob Public Health ; 4(1): e0002513, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38241250

RESUMO

Artificial intelligence (AI) and machine learning are central components of today's medical environment. The fairness of AI, i.e. the ability of AI to be free from bias, has repeatedly come into question. This study investigates the diversity of members of academia whose scholarship poses questions about the fairness of AI. The articles that combine the topics of fairness, artificial intelligence, and medicine were selected from Pubmed, Google Scholar, and Embase using keywords. Eligibility and data extraction from the articles were done manually and cross-checked by another author for accuracy. Articles were selected for further analysis, cleaned, and organized in Microsoft Excel; spatial diagrams were generated using Public Tableau. Additional graphs were generated using Matplotlib and Seaborn. Linear and logistic regressions were conducted using Python to measure the relationship between funding status, number of citations, and the gender demographics of the authorship team. We identified 375 eligible publications, including research and review articles concerning AI and fairness in healthcare. Analysis of the bibliographic data revealed that there is an overrepresentation of authors that are white, male, and are from high-income countries, especially in the roles of first and last author. Additionally, analysis showed that papers whose authors are based in higher-income countries were more likely to be cited more often and published in higher impact journals. These findings highlight the lack of diversity among the authors in the AI fairness community whose work gains the largest readership, potentially compromising the very impartiality that the AI fairness community is working towards.

18.
PLOS Digit Health ; 3(1): e0000346, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38175828

RESUMO

In recent years, technology has been increasingly incorporated within healthcare for the provision of safe and efficient delivery of services. Although this can be attributed to the benefits that can be harnessed, digital technology has the potential to exacerbate and reinforce preexisting health disparities. Previous work has highlighted how sociodemographic, economic, and political factors affect individuals' interactions with digital health systems and are termed social determinants of health [SDOH]. But, there is a paucity of literature addressing how the intrinsic design, implementation, and use of technology interact with SDOH to influence health outcomes. Such interactions are termed digital determinants of health [DDOH]. This paper will, for the first time, propose a definition of DDOH and provide a conceptual model characterizing its influence on healthcare outcomes. Specifically, DDOH is implicit in the design of artificial intelligence systems, mobile phone applications, telemedicine, digital health literacy [DHL], and other forms of digital technology. A better appreciation of DDOH by the various stakeholders at the individual and societal levels can be channeled towards policies that are more digitally inclusive. In tandem with ongoing work to minimize the digital divide caused by existing SDOH, further work is necessary to recognize digital determinants as an important and distinct entity.

19.
PLOS Digit Health ; 3(1): e0000417, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38236824

RESUMO

The study provides a comprehensive review of OpenAI's Generative Pre-trained Transformer 4 (GPT-4) technical report, with an emphasis on applications in high-risk settings like healthcare. A diverse team, including experts in artificial intelligence (AI), natural language processing, public health, law, policy, social science, healthcare research, and bioethics, analyzed the report against established peer review guidelines. The GPT-4 report shows a significant commitment to transparent AI research, particularly in creating a systems card for risk assessment and mitigation. However, it reveals limitations such as restricted access to training data, inadequate confidence and uncertainty estimations, and concerns over privacy and intellectual property rights. Key strengths identified include the considerable time and economic investment in transparent AI research and the creation of a comprehensive systems card. On the other hand, the lack of clarity in training processes and data raises concerns about encoded biases and interests in GPT-4. The report also lacks confidence and uncertainty estimations, crucial in high-risk areas like healthcare, and fails to address potential privacy and intellectual property issues. Furthermore, this study emphasizes the need for diverse, global involvement in developing and evaluating large language models (LLMs) to ensure broad societal benefits and mitigate risks. The paper presents recommendations such as improving data transparency, developing accountability frameworks, establishing confidence standards for LLM outputs in high-risk settings, and enhancing industry research review processes. It concludes that while GPT-4's report is a step towards open discussions on LLMs, more extensive interdisciplinary reviews are essential for addressing bias, harm, and risk concerns, especially in high-risk domains. The review aims to expand the understanding of LLMs in general and highlights the need for new reflection forms on how LLMs are reviewed, the data required for effective evaluation, and addressing critical issues like bias and risk.

20.
Crit Care Explor ; 6(1): e1033, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38239408

RESUMO

OBJECTIVES: Although illness severity scoring systems are widely used to support clinical decision-making and assess ICU performance, their potential bias across different age, sex, and primary language groups has not been well-studied. DESIGN SETTING AND PATIENTS: We aimed to identify potential bias of Sequential Organ Failure Assessment (SOFA) and Acute Physiology and Chronic Health Evaluation (APACHE) IVa scores via large ICU databases. SETTING/PATIENTS: This multicenter, retrospective study was conducted using data from the Medical Information Mart for Intensive Care (MIMIC) and eICU Collaborative Research Database. SOFA and APACHE IVa scores were obtained from ICU admission. Hospital mortality was the primary outcome. Discrimination (area under receiver operating characteristic [AUROC] curve) and calibration (standardized mortality ratio [SMR]) were assessed for all subgroups. INTERVENTIONS: Not applicable. MEASUREMENTS AND MAIN RESULTS: A total of 196,310 patient encounters were studied. Discrimination for both scores was worse in older patients compared with younger patients and female patients rather than male patients. In MIMIC, discrimination of SOFA in non-English primary language speakers patients was worse than that of English speakers (AUROC 0.726 vs. 0.783, p < 0.0001). Evaluating calibration via SMR showed statistically significant underestimations of mortality when compared with overall cohort in the oldest patients for both SOFA and APACHE IVa, female patients (1.09) for SOFA, and non-English primary language patients (1.38) for SOFA in MIMIC. CONCLUSIONS: Differences in discrimination and calibration of two scores across varying age, sex, and primary language groups suggest illness severity scores are prone to bias in mortality predictions. Caution must be taken when using them for quality benchmarking and decision-making among diverse real-world populations.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...